Integration of Intonation in F0 Trajectory prediction using MSD-HMMs
نویسندگان
چکیده
Present study in speech synthesis places more and more emphasis on the spectral continuities and diverse prosodic effects. The trainable HMM-based speech synthesis method tends to generate more continuous spectral structures than the traditional unit selection method. However, the F0 trajectory generated by HMM-based speech synthesis is often excessively smoothed and lacks prosodic variance. This paper proposed an approach to improve the effect of F0 trajectory prediction in mandarin speech synthesis in the framework of multi-space probability distribution HMMs (MSD-HMMs). In the proposed approach, the intonation, which is predicted by context-dependent decision trees, is integrated to the F0 trajectory generated by the MSD-HMMs as a weighted bias term. The experiments indicate that it has an encouraging improvement in the prosodic effectiveness of Mandarin speech synthesis.
منابع مشابه
Improved generation of prosodic features in HMM-based Mandarin speech synthesis
The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However, the prosodic features, like F0 and duration trajectories, generated by HMM-based speech synthesis are often excessively smoothed and lack prosodic variance. In HMM-based TTS durations are typically modeled statistically using state duration probabili...
متن کاملGenerating natural F0 trajectory with additive trees
In HMM-based TTS, while the segmental quality of synthesized speech is quite acceptable, intonation, especially at the sentence level, tends to be somewhat bland. The maximum likelihood (ML) criterion used in HMM training and parameter trajectory generation is partially responsible for the blandness. Additionally, the F0 trajectory thus generated has a smaller dynamic range than that of natural...
متن کاملEmotion conversion using F0 segment selection
This paper describes F0 segment selection, a novel syllablebased F0 conversion method, which provides a concatenative framework to search for F0 segments in a modest corpus of emotional speech (∼15 minutes of data). The method is compared with our earlier work on F0 generation using contextsensitive syllable HMMs. Both methods are complemented with a duration conversion module as well as GMM-ba...
متن کاملUsing Zero-Frequency Resonator to Extract Multilingual Intonation Structure
Human uses expressive intonation to convey linguistic and paralinguistic meaning, especially making focal prominence to give emphasis that highlights the focus of speech. Automatic extraction of dynamic intonation feature from a speech corpus and representing it in a continuous form are desired in multilingual speech synthesis. This paper presents a method to extract dynamic prosodic structure ...
متن کاملAutomatic Intonation Event Detection Using Tilt Model for Croatian Speech Synthesis
Text-to-speech systems convert text into speech. Synthesized speech without prosody sounds unnatural and monotonous. In order to sound natural, prosodic elements have to be implemented. The generation of prosodic elements directly from text is a rather demanding task. Our final goals are building a complete prosodic model for Croatian and implementing it into our TTS system. In this work, we pr...
متن کامل